Search CORE

5,179 research outputs found

Bounding the Probability of Error for High Precision Recognition

Author: Huang Gary B.
Kae Andrew
Learned-Miller Erik
Publication venue
Publication date: 01/01/2009
Field of study

We consider models for which it is important, early in processing, to estimate some variables with high precision, but perhaps at relatively low rates of recall. If some variables can be identified with near certainty, then they can be conditioned upon, allowing further inference to be done efficiently. Specifically, we consider optical character recognition (OCR) systems that can be bootstrapped by identifying a subset of correctly translated document words with very high precision. This "clean set" is subsequently used as document-specific training data. While many current OCR systems produce measures of confidence for the identity of each letter or word, thresholding these confidence values, even at very high values, still produces some errors. We introduce a novel technique for identifying a set of correct words with very high precision. Rather than estimating posterior probabilities, we bound the probability that any given word is incorrect under very general assumptions, using an approximate worst case analysis. As a result, the parameters of the model are nearly irrelevant, and we are able to identify a subset of words, even in noisy documents, of which we are highly confident. On our set of 10 documents, we are able to identify about 6% of the words on average without making a single error. This ability to produce word lists with very high precision allows us to use a family of models which depends upon such clean word lists

arXiv.org e-Print Archive

CiteSeerX

ScholarWorks@UMass Amherst

Learned versus Hand-Designed Feature Representations for 3d Agglomeration

Author: Bogovic John A.
Huang Gary B.
Jain Viren
Publication venue
Publication date: 20/12/2013
Field of study

For image recognition and labeling tasks, recent results suggest that machine learning methods that rely on manually specified feature representations may be outperformed by methods that automatically derive feature representations based on the data. Yet for problems that involve analysis of 3d objects, such as mesh segmentation, shape retrieval, or neuron fragment agglomeration, there remains a strong reliance on hand-designed feature descriptors. In this paper, we evaluate a large set of hand-designed 3d feature descriptors alongside features learned from the raw data using both end-to-end and unsupervised learning techniques, in the context of agglomeration of 3d neuron fragments. By combining unsupervised learning techniques with a novel dynamic pooling scheme, we show how pure learning-based methods are for the first time competitive with hand-designed 3d shape descriptors. We investigate data augmentation strategies for dramatically increasing the size of the training set, and show how combining both learned and hand-designed features leads to the highest accuracy

arXiv.org e-Print Archive

CiteSeerX

Annotating Synapses in Large EM Datasets

Author: Huang Gary B.
Olbris Donald J.
Parag Toufiq
Plaza Stephen M.
Rivlin Patricia K.
Saunders Mathew A.
Publication venue
Publication date: 04/12/2014
Field of study

Reconstructing neuronal circuits at the level of synapses is a central problem in neuroscience and becoming a focus of the emerging field of connectomics. To date, electron microscopy (EM) is the most proven technique for identifying and quantifying synaptic connections. As advances in EM make acquiring larger datasets possible, subsequent manual synapse identification ({\em i.e.}, proofreading) for deciphering a connectome becomes a major time bottleneck. Here we introduce a large-scale, high-throughput, and semi-automated methodology to efficiently identify synapses. We successfully applied our methodology to the Drosophila medulla optic lobe, annotating many more synapses than previous connectome efforts. Our approaches are extensible and will make the often complicated process of synapse identification accessible to a wider-community of potential proofreaders

arXiv.org e-Print Archive

CiteSeerX

The Limitations of Stock Market Efficiency: Price Informativeness and CEO Turnover

Author: Gary B. Gorton
Lixin Huang
Qiang Kang
Publication venue
Publication date
Field of study

Stock prices are more informative when the information has less social value. Speculators with limited resources making costly (private) information production decisions must decide to produce information about some firms and not others. We show that producing and trading on private information is most profitable in the stocks of firms with poor corporate governance -- precisely because it will not be acted upon -- and less profitable at firms with better corporate governance. To the extent that the information in the stock price is used for disciplining the CEO by the board of directors, the informed trader has a reduced incentive to produce the information in the first place. We test our model using the probability of informed trading (PIN) and the probability of forced CEO turnover in a simultaneous-equation system. The empirical results support the model predictions. Stock prices are efficient, but there is a limit to the disciplining role they can fulfill. We apply the model to evaluate the effects of the Sarbanes-Oxley Act of 2002.

Research Papers in Economics

A genetic screen for regulators of muscle morphogenesis in Drosophila

Author: Gontarz Paul
Huang Gary
Johnson Aaron N
Ou Tiffany
Skeath James B
Wilson Beth
Publication venue: Digital Commons@Becker
Publication date: 16/05/2021
Field of study

The mechanisms that determine the final topology of skeletal muscles remain largely unknown. We have been developing Drosophila body wall musculature as a model to identify and characterize the pathways that control muscle size, shape, and orientation during embryogenesis (Johnson et al., 2013; Williams et al., 2015; Yang et al., 2020a; Yang et al., 2020b). Our working model argues muscle morphogenesis is regulated by (1) extracellular guidance cues that direct muscle cells toward muscle attachment sites, and (2) contact dependent interactions between muscles and tendon cells. While we have identified several pathways that regulate muscle morphogenesis, our understanding is far from complete. Here we report the results of a recent EMS-based forward genetic screen that identified a myriad of loci not previously associated with muscle morphogenesis. We recovered new alleles of known muscle morphogenesis genes, including back seat driver, kon-tiki, thisbe, and tumbleweed, arguing our screen had the depth and precision to uncover myogenic genes. We also identified new alleles of spalt-major, barren, and patched that presumably disrupt independent muscle morphogenesis pathways. Equally as important, our screen shows that at least 11 morphogenetic loci remain to be mapped and characterized. Our screen has developed exciting new tools to study muscle morphogenesis, which may provide future insights into the mechanisms that regulate skeletal muscle topology

Digital Commons@Becker

PubMed Central